Hands-on Machine Learning with R
Boookclub R-Ladies Utrecht and R-Ladies Den Bosch
Intro (Chapter 1 & 2) - Gerbrich Ferdinands
Feature & Target Engineering (Chapter 3) - Ale Segura
Linear & Logistic regression (Chapter 4 & 5) - Martine Jansen
Regularized regression (Chapter 6) - Marianna Sebő
MARS & K-nearest neighbors (Chapter 7 & 8) - Elena Dudukina
Make predictions by asking simple questions about features
Non-parametric, similar responses are grouped by splitting rules
Easy to interpret and visualize with tree diagrams
Downside: often perform worse than more complex algorithms
Classification and regression tree
Data is partitioned into similar subgroups
Each subgroup (or node) is created by asking simple yes/no questions about each feature (e.g., is age < 18?)
This is done a number of times, until the stopping criteria are reached (eg. maximum depth)
Regression trees predict the average response value in a subgroup; classification trees predict the class that this observation belongs to
Binary recursive partitioning
Objective at each node: find the “best” feature/split combination
The splitting process is then repeated in each of the two regions
Features can be used multiple times in the same tree
# Helper packages
library(dplyr) # for data wrangling
library(ggplot2) # for awesome plotting
# Modeling packages
library(rsample) # for sampling the data
library(rpart) # direct engine for decision tree application
library(caret) # meta engine for decision tree application
library(ipred) # bagging
# Model interpretability packages
library(rpart.plot) # for plotting decision trees
library(vip) # for feature importance
library(pdp) # for feature effectsAutomated feature selection: uninformative features are not used in the model.
Easy to explain, visually appealing
Require little preprocessing
Not sensitive to outliers or missing data
Can handle mix of categorical and numeric features
Not the best predictors (other models we’ve seen so far are better at predicting)
Simple yes/no questions result in rigid, non-smooth boundaries
Deep trees: low bias, high variance (risk of overfitting)
Shallow trees: high bias, low variance (low predictability)
Fit multiple prediction models and take the average
By model averaging, bagging helps to reduce variance and minimize overfitting
Especially useful for unstable, high variance models (where predicted output undergoes major changes in response to small changes in the training data)
Create b bootstrap copies of the original training data
Fit your algorithm (commonly referred to as the base learner) to each bootstrap sample
New predictions are made by averaging predictions of the individual base learners
Bagging 50-500 decision trees leads to optimal performance
A single pruned decision tree performs worse than MARS or KNN
100 unpruned, bagged decision trees perform better
Depending on number of iterations, this can become computationally intense
Pro
Bagging improves prediction accuracy for high variance (and low bias) models
Con
We’re still looking for presenters, so let us know if you’re interested :)
R-Ladies theme for Quarto Presentations. Code available on GitHub.